================================================================================
Documentation: Contiguous United States County Population and Area-Weighted Daily ERA5-Land Temperature 1970 to 2024
================================================================================

Overview description of the file(s). 

Daily time-series for minimum, mean, and maximum temperature are processed 
from ERA5-Land 9km rasters to United States counties, as denoted in the 
2020 US Tiger/LINE Census Shapefile. Area and population weighted estimated
are included. Population weighting is based on the Global Human Settlement
Layer (GHS-POP, https://human-settlement.emergency.copernicus.eu/ghs_pop2023.php).
Population is attributed based on 5-year increments applied retrospectively
to temperatures (i.e.: population estimates for 1975 are used to weight 
temperature data for 1970 to 1975, population estimates for 1980 are used to
weight temperature data for 1976 to 1980).  

See references and further details on the code for processing below.

---------- FILE METADATA -------------------------------------------------------
File name:              "county_agg_era5_nationwide_<YYYY>.rds"
File created by:        Z. Popp (zpopp@bu.edu)
Date added:             3 December 2025
Date last modified:     
Note: Each annual RDS for all counties is stored in a decadal zip file.

---------- DATA EXTENT/RESOLUTION ----------------------------------------------

Time extent:            1970-2024
Time resolution:        daily

Spatial extent:         United States
Spatial resolution:     County 

---------- CHANGES BY VERSION NUMBER -------------------------------------------
v01     No changes; this is the original dataset


---------- VARIABLES -----------------------------------------------------------
GEOID			Unique county geographic identifier. Temperature estimated
			align with 2020 US Counties data from the US Census
			TIGER/Line shapefile.
date			Date (YYYY-mm-dd) of temperature estimates. Aggregation to
			min, mean, and max based on 24 hours from midnight to midnight
			in local time zone.
t2m_mean_areawt		Daily mean temperature for date, with areal weight based on 
			ERA5-Land grid cell and US county shapefile.
t2m_mean_popwt		Daily mean temperature for date, with population weight based on 
			GHS-POP population by ERA5-Land grid cell and US county shapefile.
t2m_max_areawt		Daily mean temperature for date, with areal weight based on 
			ERA5-Land grid cell and US county shapefile.
t2m_max_popwt		Daily mean temperature for date, with population weight based on 
			GHS-POP population by ERA5-Land grid cell and US county shapefile.
t2m_min_areawt		Daily mean temperature for date, with areal weight based on 
			ERA5-Land grid cell and US county shapefile.
t2m_min_popwt		Daily mean temperature for date, with population weight based on 
			GHS-POP population by ERA5-Land grid cell and US county shapefile.


---------- RAW SOURCE DATA -----------------------------------------------------
Data Source 1: 	https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=overview
        ERA5-Land Hourly Data (1950-Present). See Metadata About Data Sources
        for details.
		* Accessed 30 October 2025 to 10 November 2025
Data Source 2: 	https://human-settlement.emergency.copernicus.eu/ghs_pop2023.php
		* Accessed 5 November 2025, Last Modified 28 Apr 2023
Data Source 3:	https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2020.html#list-tab-790442341 
		* Accessed 5 November 2025


---------- DESCRIPTION OF DATA SET DERIVATION ----------------------------------
Note: This code is adapted from - https://github.com/Climate-CAFE/era5-daily-heat-aggregation

General steps to process these data:

01A_Query_ERA5_Land_CDSAPI_Batch_Nationwide_YYYY.R 
Applies the ecmwfr package to query hourly 
ERA5-Land data from the Copernicus CDS using the ecmwfr package.
A for loop is used to query these data in monthly cuts across the 55-year period
of the full dataset. These cuts were set as the Copernicus CDS API restricts
the amount of data accessible in a single pull.

R packages: ecmwfr, sf, dplyr, keyring, tigris

02A_ERA5_Extraction_Pts.R 
Process ERA5 rasters to administrative boundaries. 
This script is the   first in a two-step raster processing process . In this script we build
a polygon grid that intersects our polygons of interest against the grid
of ERA5 inputs. This will be used to calculate area-weighted
estimates of heat metrics at the polygon level, as well as population-
weighted estimates. We also do other processing that only needs to be run
once in this script (ie: not time varying as is the case with hourly
to daily aggregation). The additional processing includes building a 
raster marking where there are ERA5-Land pixels of 50%+ water.
More details about the use of a fishnet for spatial aggregation of raster data 
can be found here: https://github.com/Climate-CAFE/population_weighting_raster_data

R packages: terra, sf, plur, doBy, tidyverse, tigris, lutz, exactextractr

03_Aggregate_ERA5_Tmax.R 
Process ERA5 rasters to administrative boundaries. 
This script is the second in a two-step raster processing process. We will use the linked
grid cell - polygon points to extract daily temperature values and
then estimate polygon-level area and population weighted daily measures.

First, we need to estimate daily minimum, mean, and maximum temperature
from the hourly data. To get this we need to convert from the UTC time zone
in which the ERA5 data is distributed, to local time (as the min, mean, 
and max should be based on local time). We are here applying the code
based on a time zone detected at each grid cell centroid from 
the R 'lutz' package

R packages: terra, sf, plur, doBy, tidyverse, tigris, lutz, exactextractr, lubridate

---------- REFERENCES ----------------------------------------------------------

ECMWFR package reference:

Hufkens, K., R. Stauffer, & E. Campitelli. (2019). ecmwfr: Programmatic interface to the two European Centre for Medium-Range Weather Forecasts API services. Zenodo. http://doi.org/10.5281/zenodo.2647531.

ERA5 Data reference:

Muñoz Sabater, J. (2019): ERA5-Land hourly data from 1950 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.e2161bac (Accessed on 10-Nov-2025)

Population Data Reference: 
Schiavina M., Freire S., Carioli A., MacManus K. (2023):
GHS-POP R2023A - GHS population grid multitemporal (1975-2030).European Commission, Joint Research Centre (JRC)
PID: http://data.europa.eu/89h/2ff68a52-5b5b-4a22-8f40-c41da8332cfe, doi:10.2905/2FF68A52-5B5B-4A22-8F40-C41DA8332CFE


TIGER/Line Shapefiles and Query:

Walker K (2023). _tigris: Load Census TIGER/Line Shapefiles_. R package version 2.0.4,
<https://CRAN.R-project.org/package=tigris>.

Census Bureau (2021). "2020 TIGER/Line Shapefiles." Published on 2 February 2021.
https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2020.html#list-tab-790442341 